-
Notifications
You must be signed in to change notification settings - Fork 0
feature(Types): added Experiment type encompassing Pipelines, metrics, project_name, etc. #114
Conversation
…, project_name, etc.
save_remote: Optional[ | ||
bool | ||
] = None, # If set True all models will try uploading (if configured), if set False it overwrites uploading of any models (even if configured) | ||
remote_logging: Optional[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks like remote_logging was not used anywhere! did we have it wired up on latest main
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is used in one place: in run.py
!
logger_plugins = (
[
WandbPlugin(
WandbConfig(
project_id=project_id,
run_name=config.run_name + "-" + pipeline.id,
train=True,
),
dict(
run_config=config.get_configs(),
preprocess_config=preprocess_config.get_configs(),
pipeline_configs=pipeline.get_configs(),
),
)
]
if config.remote_logging
else []
)```
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, my mistake! I meant save_remote, not remote_logging!
self.plugins = obligatory_plugins + plugins | ||
|
||
self.pipeline = overwrite_model_configs(self.config, self.pipeline) | ||
self.pipeline = experiment.pipeline # maybe we want to deepcopy it first? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: should we deepcopy the pipeline first before we modify it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you already may have asked this question
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be deepcopied, although it won't make a difference inpractice because with each run we run the function aswell.
run_name: str # Get's appended as a prefix before the pipeline name | ||
train: bool # Weather the run should do training | ||
dataset: pd.DataFrame | ||
pipeline: "Pipeline" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I needed to do forward declaration here, otherwise we're in for some dependency cycle fun!
run_name: str # Get's appended as a prefix before the pipeline name | ||
train: bool # Weather the run should do training | ||
dataset: pd.DataFrame | ||
pipeline: "Pipeline" | ||
metrics: Evaluators |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
metrics looks relatively useful here, as it may change based on the experiment?
run_name: str # Get's appended as a prefix before the pipeline name | ||
train: bool # Weather the run should do training | ||
dataset: pd.DataFrame | ||
pipeline: "Pipeline" | ||
metrics: Evaluators | ||
preprocessing_config: PreprocessConfig |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so actually there could be multiple preprocessing_configs, if we merge the data. so this is in theory incorrect, in practice, it's probably fine to pass in the one we want to log.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean in case there are multiple initial data sources?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right now, multiple ones that we merge together.
But later, there'll be multiple datasources, I completely forgot about that as well!
The downside of this is that if I only want to change the pipeline we're running, but keep the rest intact, I'll need to do it in a loop or array comprehension to create the different Experiments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good
Closes #115
Closes #109